Model - based Direct Policy Search ( Extended Abstract ) Jan
نویسندگان
چکیده
Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.
منابع مشابه
The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach
Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...
متن کاملReward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previo...
متن کاملNeuro-Evolution for Multi-Agent Policy Transfer in RoboCup Keep-Away: (Extended Abstract)
An objective of transfer learning is to improve and speedup learning on target tasks after training on a different, but related source tasks. This research is a study of comparative Neuro-Evolution (NE) methods for transferring evolved multi-agent policies (behaviors) between multi-agent tasks of varying complexity. The efficacy of five variants of two NE methods are compared for multi-agent po...
متن کاملPolicy Search with High-Dimensional Context Variables
Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, suc...
متن کامل